Goto

Collaborating Authors

 pvm unit


Scaling up AI

#artificialintelligence

There remains obviously data parallelism, in which case one can actually process separate batches of data on instances running on different GPU's and speed up training, but it is by no means possible to take say VGG16 network, multiply the number of layers by 10x, multiply the number of feature maps by 10x, multiply the size of each layer by 10x, throw it at 1000 GPUs and expect the thing to train successfully. Aside from the problems of efficiently implementing the parallel execution, the thing would need exp(1000) more data and training iterations which is not possible. This is part of the reason why AlexNet or VGG 16 architecture, which are now is a few years old, are still the base architectures for many applications, while the biggest deep learning instances trained today are at most one order of magnitude larger.